Merge our internal changes and fix conflicts for DF branch 50#13
Conversation
* Initial commit * Fix formatting * Add across partitions check * Add new test case Add a new test case * Fix buggy test
…#13909) (apache#13934) * Set utf8view as return type when input type is the same * Verify that the returned type from call to scalar function matches the return type specified in the return_type function * Match return type to utf8view Co-authored-by: Tim Saucer <timsaucer@gmail.com>
This reverts commit 5383d30.
* fix: fetch is missed in the EnfoceSorting * fix conflict * resolve comments from alamb * update
…it disabled by default
…e#14415) (apache#14453) * chore: Fixed CI * chore * chore: Fixed clippy * chore Co-authored-by: Alex Huang <huangweijun1001@gmail.com>
* Test for string / numeric coercion * fix tests * Update tests * Add tests to stringview * add numeric coercion
| /// (reading) If true, parquet reader will read columns of `Utf8/Utf8Large` with `Utf8View`, | ||
| /// and `Binary/BinaryLarge` with `BinaryView`. | ||
| pub schema_force_view_types: bool, default = true | ||
| pub schema_force_view_types: bool, default = false |
There was a problem hiding this comment.
Default to utf8.
| // Whether to allow truncated rows when parsing. | ||
| // By default this is set to false and will error if the CSV rows have different lengths. | ||
| // When set to true then it will allow records with less than the expected number of columns | ||
| pub truncated_rows: Option<bool>, default = None |
There was a problem hiding this comment.
Our csv truncated_rows support, will be included in DF 51.0.0, but not for DF 50.0.0.
apache#17465
| DecoderDeserializer::new(CsvDecoder::new(decoder)) | ||
| } | ||
|
|
||
| fn csv_deserializer_with_truncated( |
There was a problem hiding this comment.
Our csv truncated_rows support, will be included in DF 51.0.0, but not for DF 50.0.0.
apache#17465
| /// By default this is set to false and will error if the CSV rows have different lengths. | ||
| /// When set to true then it will allow records with less than the expected number of columns and fill the missing columns with nulls. | ||
| /// If the record’s schema is not nullable, then it will still return an error. | ||
| pub truncated_rows: bool, |
There was a problem hiding this comment.
Same as above.
| assert_eq!( | ||
| string_truncation_stats.max_value, | ||
| Precision::Inexact(ScalarValue::Utf8View(Some("b".repeat(63) + "c"))) | ||
| Precision::Inexact(Utf8(Some("b".repeat(63) + "c"))) |
There was a problem hiding this comment.
We default to utf8.
| .await | ||
| .await?; | ||
| let mut id_annotator = NodeIdAnnotator::new(); | ||
| annotate_node_id_for_execution_plan(&physical_plan, &mut id_annotator) |
There was a problem hiding this comment.
Our internal node_id support.
| let filter = col("date_string_col").eq(lit(ScalarValue::new_utf8view("01/01/09"))); | ||
| // xudong: use new_utf8, because schema_force_view_types was changed to false now. | ||
| // qi: when schema_force_view_types setting to true, we should change back to utf8view | ||
| let filter = col("date_string_col").eq(lit(ScalarValue::new_utf8("01/01/09"))); |
There was a problem hiding this comment.
Default to utf8.
| self.sink.metrics() | ||
| } | ||
|
|
||
| fn with_node_id( |
There was a problem hiding this comment.
Internal node_id support.
| } | ||
| } | ||
|
|
||
| fn with_node_id( |
There was a problem hiding this comment.
Internal node_id support.
| /// Infers new predicates by substituting equalities. | ||
| /// For example, with predicates `t2.b = 3` and `t1.b > t2.b`, | ||
| /// we can infer `t1.b > 3`. | ||
| fn infer_predicates_from_equalities(predicates: Vec<Expr>) -> Result<Vec<Expr>> { |
There was a problem hiding this comment.
Related to:
apache#15906
Should we reopen our upstream PR?
| } | ||
|
|
||
| context.update_plan_from_children() | ||
| Ok((context.update_plan_from_children()?, fetch)) |
There was a problem hiding this comment.
Related to our internal fetch support for enforce_distribution.
| data, | ||
| children, | ||
| }, | ||
| mut fetch, |
There was a problem hiding this comment.
Related to our internal fetch support for enforce_distribution.
| // If `fetch` was not consumed, it means that there was `SortPreservingMergeExec` with fetch before | ||
| // It was removed by `remove_dist_changing_operators` | ||
| // and we need to add it back. | ||
| if fetch.is_some() { |
There was a problem hiding this comment.
Related to our internal fetch support for enforce_distribution.
| use datafusion_common::DataFusionError; | ||
|
|
||
| // Util for traversing ExecutionPlan tree and annotating node_id | ||
| pub struct NodeIdAnnotator { |
There was a problem hiding this comment.
Our internal support for node_id.
| /// Execution plan for values list based relation (produces constant rows) | ||
| #[deprecated( | ||
| since = "45.0.0", | ||
| note = "Use `MemorySourceConfig::try_new_as_values` instead" |
There was a problem hiding this comment.
It seems we still use deprecated API, so i can try to upgrade those cases in a follow-up PR.
| fn remove_dist_changing_operators( | ||
| mut distribution_context: DistributionContext, | ||
| ) -> Result<DistributionContext> { | ||
| ) -> Result<( |
There was a problem hiding this comment.
Internal fetch support.
Merge our internal changes and fix conflicts for DF branch 50.